Despite its publication nearly 25 years ago, the controversial book The Bell Curve (TBC)1 has persisted in conversations about IQ. The authors, Richard Herrnstein and Charles Murray, claim that IQ strongly influences many significant life events – like whether someone will enter poverty, drop out of school, get married, or even go to jail. Even more contentious, they extend the consequences of low IQ to demographics like ethnicity and nationality. As evidence, the authors build and discuss dozens of probability models using public data.
This report is in two pieces: a ~25 page written report and a ~75 page appendix with diagnostic information for the underlying TBC models. Please don’t let the total page count discourage you from reading this report! I tried to write in a style similar to that in TBC (style alone, not its content). If a reader could understand the level of math presented in TBC, I hope she could also read this report (though I admit, maybe with a bit more statistical background).
Reproducible research
Bootstrap regression
Bias-corrected accelerated confidence intervals
Automated model and report building
TBC stands out not for being a stuffy academic paper, but instead as accessible prose intended for the general population. The book is political, pop-psychology by design. TBC can be simplified into 3 sections: a summary of prior IQ research, a presentation of new probability models derived by the authors, and a discussion of the findings including recommendations for public policy. With this being an accessible book, the probability models are described in simple, interpretable terms. In fact, TBC includes extensive discussion on what multivariate analysis means – from explaining what a “variable” means in statistics to what logistic regression attempts to model. The authors even include an entire appendix titled, “Statistics for People Who Are Sure They Can’t Learn Statistics”.
The technically precise description of America’s fertility policy is that it subsidizes births among poor women, who are also disproportionately at the low end of the intelligence distribution. We urge generally that these policies, represented by the extensive network of cash and services for low-income women who have babies, be ended.
Reallocate some portion of existing elementary and secondary school federal aid away from programs for the disadvantaged to programs for the gifted… At present, there is an overwhelming tilt toward enriching the education of children from the low end of the cognitive ability distribution. We propose more of a balance across the cognitive ability distribution.
The rules that currently govern immigration provide the other major source of dysgenic pressure. It appears that the mean IQ of immigrants in the 1980s works out to about 95. The low IQ may not be a problem: in the past, immigrants have sometimes shown large increases on such measures. But other evidence indicates that the self-selection process that used to attract the classic American immigrant – brave, hard working, imaginative, self-starting, and often of high IQ – has been changing, and with it the nature of some of the immigrant population.
It should be among the goals of public policy to shift the flow of immigrants away from those admitted under the nepotistic rules (which broadly encourage the reunification of relatives) and toward those admitted under competency rules…
The data used to evaluate the relationship between IQ and each of these binary targets is the National Longitudinal Survey of Youth 1979 (NLSY79)2, which is a public dataset made available by the Bureau of Labor Statistics. In this academic study, 12,686 young Americans (aged 14 – 21 at the survey’s start) of varying backgrounds were interviewed each year between 1979 through 1994. The last relevant year for TBC was 1990, although the survey continued after this time. These interviews covered a vast number of topics which were tabulated and, more recently, released online.
Because of the book’s controversy and age, much has been said about TBC. This criticism has primarily come from social psychologists and other experts in intelligence. From my perspective, their comments seem embedded in psychology. E.g., from James Heckman’s “Lessons from the Bell Curve”4:
[The derived zAFQT89 field] is not the same as the g that can be extracted from test scores available in their data set… They do not emphasize how little of the variation in social outcomes is explained by AFQT or g.”
AFQT is an achievement test that can be manipulated by educational interventions… A person’s AFQT score is not an immutable characteristic beyond environmental manipulation.
While there is nothing wrong with these responses, I am surprised how few of them review the probability models described in TBC. In fact, I have only found one detailed, statistical review of a TBC model.
In summary, this web page describes the first logistic regression published in The Bell Curve, in which POVERTY status–either above or below an official level–was predicted by a model consisting of AFQT, SES, and AGE scores from a decade earlier.
I replicated the numbers published in the book’s Appendix 4. I then looked at how well the model fit: it didn’t, not at all in the published sample (N=3367), not much in another independent one (N=1067). I suspect HM didn’t realize they’d made a Type I error, that their model’s predictive accuracy for cases living below the POVERTY level was 0%…
source('200 Code/000_Misc.R')
source('200 Code/100_Resampling.R')
source('200 Code/200_Classification.R')
source('200 Code/300_Visualizations.R')
source('200 Code/400_Model_Definitions.R')
save.folder <- '100 Data'
code.folder <- '200 Code'
model.labels <- tbc.models$label
# Get filenames for needed data components
filenames.output <- file.path(save.folder, paste0(model.labels, '_outputs.rds'))
filenames.input <- file.path(save.folder, paste0(model.labels, '_inputs.rds'))
filenames.mcc.bootCI.training <- file.path(save.folder, paste0(model.labels, '_DT.mcc.bootCI.training.rds'))
filenames.mcc.bootCI.holdout <- file.path(save.folder, paste0(model.labels, '_DT.mcc.bootCI.holdout.rds'))
names(filenames.mcc.bootCI.training) <- model.labels
names(filenames.mcc.bootCI.holdout) <- model.labels
filenames.probability.bootCI.training <- file.path(save.folder, paste0(model.labels, '_DTs.probability.bootCI.training.rds'))
filenames.probability.bootCI.holdout <- file.path(save.folder, paste0(model.labels, '_DTs.probability.bootCI.holdout.rds'))
filenames.classification.bootCI.training <- file.path(save.folder, paste0(model.labels, '_DTs.mcc.classification.bootCI.training.rds'))
filenames.classification.bootCI.holdout <- file.path(save.folder, paste0(model.labels, '_DTs.mcc.classification.bootCI.holdout.rds'))
# Import needed data components
outputs <- rbindlist(
lapply(filenames.output, readRDS)
, fill = TRUE
, use.names = TRUE
)
inputs <- lapply(filenames.input, readRDS)
names(inputs) <- model.labels
DT.mcc.bootCI.training <- rbindlist(lapply(
seq_along(filenames.mcc.bootCI.training), function(x)
readRDS(filenames.mcc.bootCI.training[x])[, label := names(filenames.mcc.bootCI.training[x])]
)
, fill = TRUE
, use.names = TRUE
)
DT.mcc.bootCI.holdout <- rbindlist(lapply(
seq_along(filenames.mcc.bootCI.holdout)
, function(x) readRDS(filenames.mcc.bootCI.holdout[x])[, label := names(filenames.mcc.bootCI.holdout[x])]
)
, fill = TRUE
, use.names = TRUE
)
DTs.probability.bootCI.training <- lapply(filenames.probability.bootCI.training, readRDS)
DTs.probability.bootCI.holdout <- lapply(filenames.probability.bootCI.holdout, readRDS)
names(DTs.probability.bootCI.training) <- model.labels
names(DTs.probability.bootCI.holdout) <- model.labels
DTs.mcc.classification.bootCI.training <- lapply(filenames.classification.bootCI.training, readRDS)
DTs.mcc.classification.bootCI.holdout <- lapply(filenames.classification.bootCI.holdout, readRDS)
names(DTs.mcc.classification.bootCI.training) <- model.labels
names(DTs.mcc.classification.bootCI.holdout) <- model.labels
# Create category field
outputs[, category := factor(category, levels = unique(tbc.models$category))]
# Organize bootstrapped coefficients
DT.coef <- rbindlist(
lapply(outputs$label,
function(label.var) {
DT.label <- as.data.table(outputs[label == label.var])
DT.coef.label <- as.data.table(
coef(summary(DT.label[, glm.model][[1]]))
, keep.rownames = TRUE
)
DT.coef.label[, label := DT.label$label]
return(DT.coef.label)
}
)
)
DT.coef.pValue <- dcast(DT.coef, formula = label ~ rn, value.var = 'Pr(>|z|)')
# Calculate the mean modeled response from the poverty model (i.e., Pov89_modeled)
Pov89_modeled.2stdev <- inputs[['poverty']][['DT.model.training']][
, .(
prob.model = mean(Pov89_modeled, na.rm = TRUE)
)
, by = .(zAFQT89_banded = floor(zAFQT89 * 10) / 10)
][
, .(
zAFQT89_banded
, zero = abs(zAFQT89_banded - (-2))
, prob.model
)
][order(zero)][1L, prob.model]
Pov89_modeled.2stdev.formatted <- paste0(floor(Pov89_modeled.2stdev * 1e4) / 1e4 * 1e2, '%')
confusionMatrix.poverty <- inputs[['poverty']][['DT.model.training']][
, .(
`Did not enter poverty (classification)` = sum(as.integer(Pov89_modeled <= Pov89_modeled.2stdev))
, `Enter poverty (classification)` = sum(as.integer(Pov89_modeled > Pov89_modeled.2stdev))
)
, by = .(Pov89 = ifelse(Pov89 == 0, 'Did not enter poverty (true)', 'Enter poverty (true)'))
]
accuracy.poverty.formatted <- paste0(
floor(
as.numeric((confusionMatrix.poverty[1, 2] + confusionMatrix.poverty[2, 3]) / nrow(inputs[['poverty']][['DT.model.training']])) * 1e4
) / 1e2
, '%'
)
probabilityEnterPoverty <- outputs[label == 'poverty', response.mean.training]
probabilityEnterPoverty.formatted <- paste0(floor(probabilityEnterPoverty * 1e4)/1e4 * 1e2, '%')
mcc.poverty <- calculateMCC(
TN = confusionMatrix.poverty[1, 2] / nrow(inputs[['poverty']][['DT.model.training']])
, FP = confusionMatrix.poverty[1, 3] / nrow(inputs[['poverty']][['DT.model.training']])
, FN = confusionMatrix.poverty[2, 2] / nrow(inputs[['poverty']][['DT.model.training']])
, TP = confusionMatrix.poverty[2, 3] / nrow(inputs[['poverty']][['DT.model.training']])
)
mcc.poverty.formatted <- paste0(floor(mcc.poverty * 1e4) / 1e4 * 1e2, '%')
TNR.poverty <- as.numeric(confusionMatrix.poverty[1, 2] / (confusionMatrix.poverty[1, 2] + confusionMatrix.poverty[1, 3]))
TPR.poverty <- as.numeric(confusionMatrix.poverty[2, 3] / (confusionMatrix.poverty[2, 3] + confusionMatrix.poverty[2, 2]))
Extraordinary claims require extraordinary evidence. When a recommendation for public policy can negatively impact a group of people, material evidence is required to support the claim. With TBC, they cite their probability models as reason to accept their conclusions. In order to assess if these models are meaningful and reliable, they should be evaluated from the perspectives of those who could be harmed by the underlying claims.
The technically precise description of America’s fertility policy is that it subsidizes births among poor women, who are also disproportionately at the low end of the intelligence distribution.
By saying “disproportionately”, the authors imply that there is some IQ value below which the negative consequences of their probability models become most severe and worthy of political action. More specifically, if the authors are to objectively identify individuals they think have too low of IQ, they would need to state such an IQ value. And if their probability models are to be taken seriously, then the authors should use them to clarify what “disproportionate” means. Below, I play devil’s advocate for TBC authors by finding such a disproportionate IQ.
plot.training.actual <- ggplot(
inputs[['poverty']][['DT.model.training']][
, .(
response = mean(Pov89)
, modeled = mean(Pov89_modeled)
)
, by = .(
zAFQT89_banded = floor(zAFQT89 * 10)/10
)
]
, aes(
x = zAFQT89_banded
, y = response
)
) +
geom_point() +
theme_bw() +
ylab(paste('%', tbc.models[label == 'poverty', target.description]))
# +
# labs(
# color = 'Category'
# , shape = 'Category'
# )
ggplotly(
plot.training.actual +
geom_line(aes(y = modeled), color = lightBlue) +
geom_segment(
x = -3
, xend = -2
, y = Pov89_modeled.2stdev
, yend = Pov89_modeled.2stdev
, linetype = 2
, color = 'red'
) +
geom_segment(
x = -2
, xend = -2
, y = 0
, yend = Pov89_modeled.2stdev
, linetype = 2
, color = 'red'
)
)
Plot of observed vs modeled probabilities across zAFQT89
While the prior section may sound trivial to a statistician or data scientist, my point is that fairness recommends a classification analysis. In this section, I go further by claiming that TBC’s probability models should only be evaluated as predictive classifications based on the statistical philosophy best describing TBC’s models.
…[Explanation and prediction] are often conflated, yet the causal versus predictive distinction has a large impact on each step of the statistical modeling process and on its consequences. Although not explicitly stated in the statistics methodology literature, applied statisticians instinctively sense that predicting and explaining are different.
At a high-level, explanatory models:
Whereas predictive models:
How do these criteria play out in TBC? Quotes from the authors’ reflect their thoughts on explanatory modeling in TBC:
Part II was circumscribed, taking on social behaviors one at a time, focusing on causal roles…
If after looking at a variety of these other things which both theory and common sense say should have some bearing on whether a person ends up in poverty, but one ends up with a large, statistically independent role for I.Q., it seems to me to make a causal statement that I.Q. looks like its a cause of poverty, it is a reasonable thing to do.7
By the end of the chapter [on poverty], we will have drawn a controversial conclusion. How did we get there? What makes us think that we have got our causal ordering right? We will walk through the analyses that lie behind our conclusions…
Even an inherently strong relationship can result in low values of R2 if the data points are bunched in various ways, and relatively noisy relationships can result in high values if the sample includes disproportionate numbers of outliers… We therefore consider the regression coefficients themselves (and their associated p values) to suit our analytic purposes better than R2, and that is why those are the ones we relied on in the text.
Note: The authors do have a point that R-squared is not the most meaningful statistic for every model. However, they did not adequately motivate why probability models with rare events, as in the poverty model with an overall frequency of 7.24%, should ignore diagnostics at the model level.
Below, I comment on the role of predictive modeling in TBC:
The binary targets are not materially causal because any proposed mechanism from IQ, SES, and Age is not direct enough. Focusing again on the poverty model, many explanations are possible as to why an individual would enter poverty. Saying IQ is causal of poverty would require linking IQ with the complex process that ends with the NLSY79 participant below the poverty line. That said, such explanations would certainly be in the realm of associations, because there’s considerable chance (i.e., noise) involved.
As mentioned in the section introducing data used in TBC, most binary targets are defined to be measured nearly a decade in the future (relative to when IQ was measured). Undoubtedly, modeling such targets is a prospective process.
The probability models in TBC should prioritize the individual. In the prior section, I explained why fairness recommends the individual-focused classification analysis. Also, TBC’s authors frequently comment on the importance of acknowledging individuals and not to be judgmental: “We cannot think of a legitimate argument why any encounter between individual whites and blacks need be affected by the knowledge that an aggregate ethnic difference in measured intelligence is genetic instead of environmental.”
Overall, I claim that TBC’s probability models more closely follow predictive philosophy than explanatory, and so they should be evaluated as such.
Many of the binary targets are uncommon-to-rare in occurrence. In logistic regression, this is called imbalanced data and can make some statistics unreliable. To counter this imbalance, I estimate Matthew’s Correlation Coefficient (MCC), which is the correlation between the binary target and the binary classification derived from a TBC probability model. MCC can be interpreted in the same way as a regular (Pearson) correlation. The following guideline is a reasonable starting point to judge MCC values8:
| Size of Correlation | Interpretation |
|---|---|
| 0.90 to 1.00 (−0.90 to −1.00) | Very high positive (negative) correlation |
| 0.70 to 0.90 (−0.70 to −0.90) | High positive (negative) correlation |
| 0.50 to 0.70 (−0.50 to −0.70) | Moderate positive (negative) correlation |
| 0.30 to 0.50 (−0.30 to −0.50) | Low positive (negative) correlation |
| 0.00 to 0.30 (0.00 to −0.30) | negligible correlation |
MCC is robust to imbalanced data because it accounts for all possible classification scenarios9 (i.e., in the equation below, each scenario is represented in the numerator):
\[{\displaystyle {\text{MCC}}={\frac {{\mathit {TP}}\times {\mathit {TN}}-{\mathit {FP}}\times {\mathit {FN}}}{\sqrt {({\mathit {TP}}+{\mathit {FP}})({\mathit {TP}}+{\mathit {FN}})({\mathit {TN}}+{\mathit {FP}})({\mathit {TN}}+{\mathit {FN}})}}}}\]
ggplot() +
geom_jitter(
data = inputs[['poverty']][['DT.model.training']]
, aes(
x = factor(ifelse(Pov89 == 0, 'No', 'Yes'))
, y = Pov89_modeled
)
) +
geom_hline(
yintercept = Pov89_modeled.2stdev
, linetype = 2
, color = 'red'
) +
geom_rect(
data = data.table(
xstart = rep(c(0.5, 1.5), 2)
, xend = rep(c(1.5, 2.5), 2)
, ystart = c(0, 0, Pov89_modeled.2stdev, Pov89_modeled.2stdev)
, yend = c(Pov89_modeled.2stdev, Pov89_modeled.2stdev, 1, 1)
, color = c('TN', 'FN', 'FP', 'TP')
)
, aes(
xmin = xstart
, xmax = xend
, ymin = ystart
, ymax = yend
, fill = factor(color)
)
, alpha = 0.5
) +
labs(fill = 'Classification Scenarios:') +
xlab('Under the official poverty line in 1989') +
ylab('Modeled Probability') +
theme_bw() +
theme(
axis.text=element_text(size = 16)
, axis.title=element_text(size = 16)
, legend.text=element_text(size = 16)
, legend.title=element_text(size = 16)
, legend.position="bottom"
)
Plot comparing different classification scenarios
kable(
confusionMatrix.poverty[
, .(
`Poverty Model Target` = Pov89
, `Did not enter poverty (classification)`
, `Enter poverty (classification)`
)
]
, caption = 'Table comparing different classification scenarios (i.e., confusion matrix)'
, align = rep('c', 3)
) %>% kable_styling(bootstrap_options = "striped", full_width = FALSE, position = 'center')
| Poverty Model Target | Did not enter poverty (classification) | Enter poverty (classification) |
|---|---|---|
| Did not enter poverty (true) | 3093 | 30 |
| Enter poverty (true) | 226 | 18 |
simulateConfusionMatrix.constant.TNR.TPR <- function(TNR, TPR, pctPositive) {
TP <- pctPositive * TPR
TN <- (1 - pctPositive) * TNR
output <- list(
TN = TN
, FN = pctPositive * (1 - TPR)
, FP = (1 - pctPositive) * (1 - TNR)
, TP = TP
)
return(output)
}
pctPositive.range <- seq(from = 0, to = 1, length.out = 101)
TPR.range <- seq(from = 0, to = 1, length.out = 101)
TNR.range <- seq(from = 0, to = 1, length.out = 101)
DT.demoMCC <- as.data.table(
expand.grid(
pctPositive.range
, TPR.range
, TNR.range
)
)
setnames(DT.demoMCC, paste0('Var',1:3), c('pctPositive', 'TPR', 'TNR'))
DT.demoMCC[, c('TN', 'FN', 'FP', 'TP') := as.list(simulateConfusionMatrix.constant.TNR.TPR(TNR, TPR, pctPositive))]
DT.demoMCC[, mcc := calculateMCC(TN, FN, FP, TP)]
DT.demoMCC[, accuracy := (TN + TP) / (TN + FN + FP + TP)]
DT.demoMCC[, TPR_TNR := factor(paste0(TPR,'_',TNR))]
ggplot(
DT.demoMCC[TPR == floor(TPR.poverty * 1e2) / 1e2 & TNR == floor(TNR.poverty * 1e2) / 1e2]
, aes(
x = pctPositive
, y = mcc
, color = 'mcc'
)
) +
geom_line() +
geom_line(
aes(
y = accuracy
, color = 'accuracy'
)
) +
geom_vline(aes(
xintercept = probabilityEnterPoverty
, color = 'poverty'
)
, linetype = 2
, size = 1.25
, show.legend = FALSE
) +
scale_colour_manual(
name = 'Classification Metrics:\n(TPR = 0.07, TNR = 0.99)'
, values = c(
'mcc' = 'darkred'
,'accuracy' = 'blue'
,'poverty' = 'black'
)
, labels = c('Accuracy', "Matthew's Correlation Coefficient", 'Poverty Model (Training Data)')
) +
guides(
color = guide_legend(
nrow = 2
, byrow = TRUE
, override.aes = list(
linetype = c("solid", "solid", "dotted")
, size = c(1, 1, 1.25)
)
)
) +
ylab('Performance Metric Percent') +
xlab('Percent of True Observations\n(i.e., small value means rare event, large value means common event)') +
theme_bw() +
theme(
axis.text=element_text(size = 16)
, axis.title=element_text(size = 16)
, legend.text=element_text(size = 16)
, legend.title=element_text(size = 16)
, legend.position="bottom"
)
Plot comparing accuracy and Matthews Correlation Coefficient as classification metrics for imbalanced data
The most important statistical tool for this analysis is called bootstrapping10, which is used for the following:
Estimate the range of MCC values expected to contain the “true” value
Account for any abnormalities in the original data that may bias its MCC values
How does bootstrapping give us a range of MCC values? For each randomly generated bootstrapped sample, we can build a new probability model and use it to calculate an MCC value.
model <- 'poverty'
set.seed(123)
DT.boot.demo <- logit.bootstrap(
DT = inputs[[model]][['DT.model.training']]
, iterations = 2
, formula = inputs[[model]][['model parameters']][, formula][[1]]
, DT.holdout = inputs[[model]][['DT.model.holdout']]
, key = 'ID'
, parallel = FALSE
)
appendBootDemoResults <- function(DT.model, DT.boot, label, iteration) {
DT.boot <- copy(DT.boot)[, Pov89_modeled.scored := modeled_probability]
output <- DT.model[
DT.boot[
bootIteration == iteration
, mget(c('ID','Pov89_modeled.scored'))
]
, on = 'ID'
][, scoredData := label]
return(output)
}
DT.plot.boot.demo <- rbindlist(list(
appendBootDemoResults(
DT.model = inputs[[model]][['DT.model.training']]
, DT.boot = DT.boot.demo[['modeled training']]
, label = 'Scored Original Data 1'
, iteration = 1
)
, appendBootDemoResults(
DT.model = inputs[[model]][['DT.model.training']]
, DT.boot = DT.boot.demo[['modeled training']]
, label = 'Scored Original Data 2'
, iteration = 2
)
, appendBootDemoResults(
DT.model = inputs[[model]][['DT.model.training']]
, DT.boot = DT.boot.demo[['modeled bootstrap training']]
, label = 'Bootstrapped Sample 1'
, iteration = 1
)
, appendBootDemoResults(
DT.model = inputs[[model]][['DT.model.training']]
, DT.boot = DT.boot.demo[['modeled bootstrap training']]
, label = 'Bootstrapped Sample 2'
, iteration = 2
)
))
(plot.training.actual +
geom_line(aes(y = modeled)) +
geom_line(
data = DT.plot.boot.demo[
scoredData %like% 'Boot'
, .(
counts = .N
, modeled = mean(Pov89_modeled.scored)
)
, by = .(
scoredData
, zAFQT89_banded = floor(zAFQT89 * 10)/10
)
]
, aes(
x = zAFQT89_banded
, y = modeled
, color = factor(scoredData)
)
, linetype = 2
) +
guides(color = 'none')) %>%
ggplotly() %>%
layout(
legend = list(
orientation = "h"
, xanchor = "center"
, x = 0.5
, y = 0
)
)
Plot showing example of bootstrapped modeled probabilities
mcc.boot.demo <- c(
calculateMCC.boot(
DT.probabilities = inputs[[model]][['DT.model.training']]
, probabilityString = 'Pov89_modeled'
, actualString = 'Pov89'
, bootLabelString = ''
, numCutoffs = 1
, singleCutoff = 0.15
)[, mcc]
, calculateMCC.boot(
DT.probabilities = DT.boot.demo[['modeled bootstrap training']][bootIteration == 1]
, probabilityString = 'modeled_probability'
, actualString = 'actual'
, bootLabelString = ''
, numCutoffs = 1
, singleCutoff = 0.15
)[, mcc]
, calculateMCC.boot(
DT.probabilities = DT.boot.demo[['modeled bootstrap training']][bootIteration == 2]
, probabilityString = 'modeled_probability'
, actualString = 'actual'
, bootLabelString = ''
, numCutoffs = 1
, singleCutoff = 0.15
)[,mcc]
)
DT.mcc.boot.demo <- data.table(
Data = c(
'Original Poverty Training'
, 'Bootstrapped Sample 1'
, 'Bootstrapped Sample 2'
)
, MCC = mcc.boot.demo
)
kable(
DT.mcc.boot.demo
, caption = 'Bootstrapped MCC Values\n(Cutoff = 0.15)'
, align = c('l', 'c')
) %>%
kable_styling(
bootstrap_options = "striped"
, full_width = FALSE
)
| Data | MCC |
|---|---|
| Original Poverty Training | 0.2228674 |
| Bootstrapped Sample 1 | 0.2738615 |
| Bootstrapped Sample 2 | 0.2094425 |
How can bootstrapping help account for any abnormalities in the original data? By abnormality, I mean a set of observations in the original data that poorly (or even favorably) reflects the larger population. These abnormalities could bias certain statistics (e.g., MCC values) and lead to unreasonable conclusions. Bootstrapping can reduce the impact of these abnormalities by comparing statistics from:
Bootstrapped models applied to the same bootstrapped sample
Bootstrapped models applied to the original data
Models built from and applied to the original data
(plot.training.actual +
geom_line(aes(y = modeled)) +
geom_line(
data = DT.plot.boot.demo[
scoredData %like% 'Score'
, .(
counts = .N
, modeled = mean(Pov89_modeled.scored)
)
, by = .(
scoredData
, zAFQT89_banded = floor(zAFQT89 * 10)/10
)
]
, aes(
x = zAFQT89_banded
, y = modeled
, color = factor(scoredData)
)
, linetype = 2
) +
guides(color = 'none')) %>%
ggplotly() %>%
layout(
legend = list(
orientation = "h"
, xanchor = "center"
, x = 0.5
, y = 0
)
)
Plot showing example of modeled probabilities from bootstrapped model applied to the original data
mcc.optimism.demo <- c(
calculateMCC.boot(
DT.probabilities = DT.boot.demo[['modeled training']][bootIteration == 1]
, probabilityString = 'modeled_probability'
, actualString = 'actual'
, bootLabelString = ''
, numCutoffs = 1
, singleCutoff = 0.15
)[, mcc]
, calculateMCC.boot(
DT.probabilities = DT.boot.demo[['modeled training']][bootIteration == 2]
, probabilityString = 'modeled_probability'
, actualString = 'actual'
, bootLabelString = ''
, numCutoffs = 1
, singleCutoff = 0.15
)[,mcc]
)
DT.mcc.optimism.demo <- data.table(
Data = c(
'Bootstrapped Sample 1'
, 'Bootstrapped Sample 2'
)
, `Applied to Bootstrapped Model` = mcc.boot.demo[-1]
, `Applied to Original Model` = mcc.optimism.demo
, `Difference in MCC Values` = mcc.boot.demo[-1] - mcc.optimism.demo
)
optimism.demo <- mean(mcc.boot.demo[-1] - mcc.optimism.demo)
kable(
DT.mcc.optimism.demo
, caption = 'Bootstrapped Optimism for MCC (Cutoff = 0.15)'
, align = c('l', rep('c', 3))
) %>%
kable_styling(
bootstrap_options = "striped"
, full_width = FALSE
) %>%
add_header_above(c(' ' = 1, 'MCC' = 2, ' ' = 1)) %>%
row_spec(2, hline_after = TRUE)
| Data | Applied to Bootstrapped Model | Applied to Original Model | Difference in MCC Values |
|---|---|---|---|
| Bootstrapped Sample 1 | 0.2738615 | 0.2184252 | 0.0554363 |
| Bootstrapped Sample 2 | 0.2094425 | 0.2279203 | -0.0184778 |
\[ \begin{aligned} \text{Optimism-corrected MCC Value} &= \text{Apparent MCC Value} - \text{Optimism Estimate} \\ &= 0.22286 - 0.01847 \\ &= 0.20438 \\ \end{aligned} \]
Since reproducing TBC’s models is a goal for this project, I want the exact data used by the authors. With the NLSY79 being a public dataset, this is possible. However, the NLSY79 uses a special variable coding system which was not noted in TBC. To credit the authors, they did release their modified datasets online and included a data dictionary12. In order to reproduce TBC’s results as closely as possible, I opted to use the authors’ data (i.e., “Nation.txt”)
Another consideration is how the training and holdout datasets are defined. The authors’ data includes a variable called “Sample”. As described in the NLSY79 User’s Guide, this Sample variable has the following 3 values:
- A cross-sectional sample of 6,111 respondents designed to be representative of the noninstitutionalized civilian segment of young people living in the United States in 1979 and born between January 1, 1957, and December 31, 1964 (ages 14–21 as of December 31, 1978)
- A supplemental sample of 5,295 respondents designed to oversample civilian Hispanic, black, and economically disadvantaged non-black/non-Hispanic youth living in the United States during 1979 and born between January 1, 1957, and December 31, 1964
- A sample of 1,280 respondents designed to represent the population born between January 1, 1957, and December 31, 1961 (ages 17–21 as of December 31, 1978), and who were enlisted in one of the four branches of the military as of September 30, 1978
For a given TBC model to reproduce, the training data is based on the cross-sectional partition (which is exactly what TBC’s authors do). The holdout data is based on the supplemental partition. This selection for a holdout dataset is motivated from Dr. Krenz’s classification analysis, where she used the supplemental sample as a holdout.
Also needed for reproduction are the exact model specifications. The authors listed parameter and diagnostic output for each model in appendix 4 of TBC. I used these to experiment with the “Nation.txt”" dataset in order to reproduce TBC’s models as closely as possible. 23 of the 24 reproduced models agreed with the book’s listed parameter values up to 4 decimal places or more. The 24th model reproduced the parameter values to a similar precision but the model’s categorical variable had different values. I suspect these differences come from the authors’ use of an older version of STATA, and myself using R. (Perhaps the two modeling implementations handle categorical levels differently in the design matrix, causing different parameter values.)
Below I list the names I give for each TBC model including their:
## Note: The following variables are not present in Nation.txt. I created these to facilitate reproducing TBC's models. See TBC_Bootstrap for details.
# DONT RUN
# GEDvHSGr_Ind = ifelse(GEDvHSGr == 'GED', 1, 0)
# LTHSvHS_Ind = ifelse(LTHSvHS == 'LTHS', 1, 0)
# WedBy30_Ind = ifelse(WedBy30 == '1 Yes', 1, 0)
# DONT RUN
kable(
outputs[tbc.models[, .(label, labelAppendix)], on = 'label'][
, .(
`Model` = labelAppendix
, Category = category
, `Description of Binary Target` = target.description
, `(Training)` = nobs.training
, `(Holdout)` = nobs.holdout
, `(Training)` = floor(response.mean.training * 1e3)/1e3
, `(Holdout)` = floor(response.mean.holdout * 1e3)/1e3
)
]
, caption = 'Table of TBC model properties'
, align = c('l', 'c', 'l', rep('c', 4))
) %>%
kable_styling(
bootstrap_options = "striped"
, full_width = FALSE
, fixed_thead = TRUE
) %>%
add_header_above(c(' ' = 3, 'NLSY79 Participants' = 2, 'Frequency of Binary Target' = 2))
| Model | Category | Description of Binary Target | (Training) | (Holdout) | (Training) | (Holdout) |
|---|---|---|---|---|---|---|
| Poverty | Poverty | Under the official poverty line in 1989 | 3367 | 1067 | 0.072 | 0.147 |
| Poverty (Highschool Subset) | Poverty | Under the official poverty line in 1989 | 1236 | 309 | 0.080 | 0.097 |
| Dropout | Education | Permanently dropped out of high school | 3572 | 1090 | 0.101 | 0.272 |
| Dropout (With Interaction) | Education | Permanently dropped out of high school | 3572 | 1090 | 0.101 | 0.272 |
| Get GED | Education | Received a GED instead of a high school diploma | 3494 | 941 | 0.081 | 0.157 |
| Get Bachelors | Education | Received a bachelor’s degree | 3857 | 1238 | 0.253 | 0.212 |
| Out of the Labor Force | Employment | Out of the labor force for four weeks or more in 1989 | 1686 | 488 | 0.102 | 0.168 |
| Out of the Labor Force (Highschool Subset) | Employment | Out of the labor force for four weeks or more in 1989 | 621 | 126 | 0.072 | 0.142 |
| Out of the Labor Force (College Subset) | Employment | Out of the labor force for four weeks or more in 1989 | 268 | 58 | 0.063 | 0.051 |
| Unemployed | Employment | Unemployed for four weeks or more in 1989 | 1397 | 370 | 0.071 | 0.108 |
| Unemployed (Highschool Subset) | Employment | Unemployed for four weeks or more in 1989 | 537 | 99 | 0.074 | 0.080 |
| Unemployed (College Subset) | Employment | Unemployed for four weeks or more in 1989 | 228 | 53 | 0.048 | 0.037 |
| Ever Married | Marriage | Ever married before the age of 30 | 1634 | 670 | 0.787 | 0.788 |
| Ever Married (Highschool Subset) | Marriage | Ever married before the age of 30 | 605 | 179 | 0.839 | 0.832 |
| Ever Married (College Subset) | Marriage | Ever married before the age of 30 | 237 | 130 | 0.696 | 0.769 |
| Divorced in 1st 5 Years of Marriage | Marriage | Divorced within the first 5 years of marriage | 2031 | 745 | 0.198 | 0.217 |
| Divorced in 1st 5 Years of Marriage (Highschool Subset) | Marriage | Divorced within the first 5 years of marriage | 875 | 252 | 0.195 | 0.253 |
| Divorced in 1st 5 Years of Marriage (College Subset) | Marriage | Divorced within the first 5 years of marriage | 209 | 93 | 0.071 | 0.086 |
| Divorced in 1st 5 Years of Marriage (With Parents Factors) | Marriage | Divorced within the first 5 years of marriage | 2029 | 745 | 0.198 | 0.217 |
| Surveyed in Jail | Crime | Subjects interviewed in jail at least once from 1979 to 1990 | 1945 | 554 | 0.027 | 0.084 |
| Surveyed in Jail (Highschool Subset) | Crime | Subjects interviewed in jail at least once from 1979 to 1990 | 716 | 143 | 0.011 | 0.034 |
| “Middle Class Values” | “Middle Class Values” | Subjects scored ‘yes’ on the Middle Class Values Index | 3029 | 1064 | 0.508 | 0.339 |
| “Middle Class Values” (Highschool Subset) | “Middle Class Values” | Subjects scored ‘yes’ on the Middle Class Values Index | 1162 | 303 | 0.593 | 0.498 |
| “Middle Class Values” (College Subset) | “Middle Class Values” | Subjects scored ‘yes’ on the Middle Class Values Index | 402 | 114 | 0.796 | 0.771 |
The statistical methods of this report are described in the figure below. Since this procedure is non-trivial, I simplify it as the following:
For 10,000 bootstrap iterations and each TBC model:
Define two bootstrap samples:
Bootstrapped training data
Bootstrapped holdout data
Build a bootstraped probability model using training data
With this model, score the:
Bootstrapped training sample
Bootstrapped holdout sample
Original training sample
Original holdout sample
For each scored data set, build 500 classifications using cutoff probabilities in a linear grid across (0, 1)
For each classifier, calculate Matthew’s Correlation Coefficient
For each cutoff value, calculate:
The optimism for MCC
Bias-corrected accelerated (BCa) 95% confidence intervals13 for training data
Bootstrapped percentile 95% confidence intervals for holdout data
Lastly, for each model, find the cutoff value with the largest mean bootstrapped MCC value, i.e., the “optimal cutoff”.
Note: MCC data related to the holdout sample use bootstrapped percentile confidence intervals because the fitted probability models were based on training data. Hence, BCa intervals aren’t meaningful for MCC holdout data.
# For LaTeX code used to generate this png, see:
# https://github.com/wfrierson/predictions_in_the_bell_curve/blob/master/200%20Code/Bootstrap%20Validation%20Algorithm.tex
knitr::include_graphics(file.path(code.folder, 'Bootstrap Validation Algorithm.png'))
Procedure used to produce distributions of MCC values with optimism correction
These steps are implemented in R using the following scripts within this repository:
000_Misc.R: Attaches needed packages and defines convenience functions
100_Resampling.R: Custom functions to perform resampling techniques via bootstrapping and the jackknife method
200_Classification.R: Custom functions to create classifications from modeled probabilities and then quantify their performance
300_Visualizations.R: Custom functions to create various visualizations to understand the predictive performance of binary classifications
400_Model_Definitions.R: Parameters used to define and reproduce logistic regression models from The Bell Curve
TBC_Bootstrap.R: Recursive code that executes a batch run for each TBC probability model. This script is the workhorse of the entire project. See its code for details.
For readers that would like to reproduce this project:
Clone the git repo
Install the relevant packages listed in 000_Misc.R. If there are versioning issues, please see the section listing my R session info.
Execute the following line of code from the project directory, which will launch the recursive batch runs for each of the 24 TBC models:
source(‘200 Code/TBC_Bootstrap.R’, encoding = ‘UTF-8’, echo=FALSE)
Notes:
When 10,000 bootstrap iterations are used on all 24 models in the prior script (with parallel computing), the entire recursive batch process:
Produces 193 files in the “100 Data” folder, 6.56 MBs in total
Takes 4 - 5 hours
Review the initial parameters for the batch run procedure on lines 55 - 57 to make sure they’re reasonable for your machine, namely, whether to enable parallel processing and how many CPUs to use.
In this section, I present two types of results. First, I walkthrough output from the poverty model. The discussion here is intended to help readers understand all the model output displayed in the appendix (if that detail is desired). Second, I summarize the optimal MCC values as evaluated on the training and holdout datasets for each model.
Since reproducibility is critical for this project, I display the model parameters associated with the original, non-bootstrapped poverty model. These values should correspond to the ones listed on P. 620 of TBC. (See the appendix on reproducing TBC models for details.)
model <- 'poverty'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| Pov89 | |
| zAFQT89 | -0.83767*** |
| (0.09351) | |
| zSES | -0.33008*** |
| (0.09010) | |
| zAge | -0.02384 |
| (0.07237) | |
| Constant | -2.64877*** |
| (0.07688) | |
| Observations | 3,367 |
| Log Likelihood | -784.40180 |
| Akaike Inf. Crit. | 1,576.80400 |
| Note: | p<0.1; p<0.05; p<0.01 |
The three plots below show the average of the poverty target and the modeled probabilities across a banded version of the three main variables: zAFQT89, zSES, and zAge.
model <- 'poverty'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed and bootstrapped modeled probabilities across zAFQT89 for Poverty model (Training Data)
model <- 'poverty'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed and bootstrapped modeled probabilities across zSES for Poverty model (Training Data)
model <- 'poverty'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed and bootstrapped modeled probabilities across zAge for Poverty model (Training Data)
The two plots below show the bootstrapped distributions of MCC values across a dense grid of cutoff values. The first plot is the MCC distribution when evaluated on the training data, the second is that using the holdout data.
model <- 'poverty'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
, caption = 'Table of optimal MCC values with bootstrapped confidence intervals'
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.1603206 | 0.1741112 | 0.2310828 | 0.2278948 | 0.2283286 | 0.2829347 |
| Holdout | 0.1683367 | 0.2601890 | 0.3483250 | 0.3382416 | 0.3477546 | 0.4139740 |
model <- 'poverty'
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
model <- 'poverty'
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
The two plots below show the averages of the poverty target, modeled probabilities, and the classifications using the cutoff value associated with the optimal MCC value from the training data vs a banded version of zAFQT89. The first plot shows results using the training data, the second is that using the holdout data.
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[['poverty']][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[['poverty']][['zAFQT89']]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[['poverty']][['zAFQT89']]
, xString = 'zAFQT89'
, yString = outputs[label == 'poverty', response]
, cutoff = outputs[label == 'poverty', cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == 'poverty', target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities for Poverty model (Training Data)
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[['poverty']][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[['poverty']][['zAFQT89']]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[['poverty']][['zAFQT89']]
, xString = 'zAFQT89'
, yString = outputs[label == 'poverty', response]
, cutoff = outputs[label == 'poverty', cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == 'poverty', target.descriptions.formatted]
)
Comparing bootstrapped classifications and modeled probabilities for Poverty model (Holdout Data)
The main output of this project are the following two plots, which compare the optimal MCC values between the training and holdout datasets for each model. The first plot includes models where the average bootstrapped p-value for zAFQT89 is under 5% (i.e., models where IQ tend to be statistically significant); and for the second plot, models with that average p-value above 5% (i.e., models where IQ tended to not be statistically significant). For each model, 95% confidence intervals are shown for both training and holdout data.
Note: I’m not aware of statisticians averaging bootstrapped p-values of individual coefficients. Averaging p-values is only used here to group models according to their overall noise. As shown in the plots, models with this average p-value above 5% tend to have very large 95% confidence intervals – so large that they clutter the combined plot!
ggplot(
outputs[
, .(
label
, mcc.optimism.corrected.training
, mcc.bootstrap.lower.BCa.training
, mcc.bootstrap.upper.BCa.training
, mcc.optimism.corrected.holdout
, mcc.bootstrap.lower.bootPercentile.holdout
, mcc.bootstrap.upper.bootPercentile.holdout
, Category = category
)
][
# Only plot models where the p-value for zAFQT89 is less than 0.05
# THe other models should be plotted, but their BCa intervals don't contain the sample MCC value.
# Their larger p-values indicates a very weak model
label %in% DT.coef.pValue[zAFQT89 <= 0.05]$label
]
, aes(
x = mcc.optimism.corrected.training
, y = mcc.optimism.corrected.holdout
, group = Category
, color = Category
, shape = Category
)
) +
geom_point(size = 4, stroke = 1.5) +
geom_errorbar(
aes(
ymin = mcc.bootstrap.lower.bootPercentile.holdout
, ymax = mcc.bootstrap.upper.bootPercentile.holdout
)
, color = 'black'
, alpha = 0.5
) +
geom_errorbarh(
aes(
xmin = mcc.bootstrap.lower.BCa.training
, xmax = mcc.bootstrap.upper.BCa.training
)
, color = 'black'
, alpha = 0.35
) +
scale_color_brewer(palette = "Dark2") +
xlab('Optimal MCC Value (Training Data, optimism-corrected)') +
ylab('Optimal MCC Value (Holdout Data, optimism-corrected)') +
lims(x = c(-0.2,1), y = c(-0.2,1)) +
theme_bw() +
theme(
axis.text=element_text(size=16)
, axis.title=element_text(size=16)
, legend.text=element_text(size=16)
, legend.title=element_text(size=16)
)
Comparing predictive power of TBC models via optimism corrected MCC values
ggplot(
outputs[
, .(
label
, mcc.optimism.corrected.training
, mcc.bootstrap.lower.BCa.training
, mcc.bootstrap.upper.BCa.training
, mcc.optimism.corrected.holdout
, mcc.bootstrap.lower.bootPercentile.holdout
, mcc.bootstrap.upper.bootPercentile.holdout
, Category = category
)
][
label %in% DT.coef.pValue[zAFQT89 > 0.05]$label
]
, aes(
x = mcc.optimism.corrected.training
, y = mcc.optimism.corrected.holdout
, group = Category
, color = Category
, shape = Category
)
) +
geom_point(size = 4, stroke = 1.5) +
geom_errorbar(
aes(
ymin = mcc.bootstrap.lower.bootPercentile.holdout
, ymax = mcc.bootstrap.upper.bootPercentile.holdout
)
, color = 'black'
, alpha = 0.5
) +
geom_errorbarh(
aes(
xmin = mcc.bootstrap.lower.BCa.training
, xmax = mcc.bootstrap.upper.BCa.training
)
, color = 'black'
, alpha = 0.35
) +
scale_color_brewer(palette = "Dark2") +
xlab('Optimal MCC Value (Training Data)') +
ylab('Optimal MCC Value (Holdout Data)') +
lims(x = c(-0.35,1), y = c(-0.35,1)) +
theme_bw() +
theme(
axis.text=element_text(size=16)
, axis.title=element_text(size=16)
, legend.text=element_text(size=16)
, legend.title=element_text(size=16)
)
Comparing predictive power of TBC models via optimism corrected MCC values
What does the plot comparing optimal MCC values mean? Using the guidelines for interpreting MCC values listed above, nearly all TBC models have a low correlation (i.e., MCC < 50%) between their observed binary target and the bootstrapped classification when evaluated on either training or holdout data. The classification plots tell a similar story: the confidence intervals for the optimal classifications tend to not follow the general trend of the observed data. Combined, these properties suggest that TBC’s probability models are not materially predictive and so they inadequately support the authors’ public policy recommendations.
The 24 TBC models can be reproduced in R with the following formulae and subset definitions for the training data. (The holdout data has the same subset definition per model but with Sample == ‘Supplement’.) These reproductions can be confirmed by comparing the regression tables in the Model Output section of this paper with the diagnostic data shown on TBC appendix pages cited below.
kable(
outputs[tbc.models[, .(label, labelAppendix)], on = 'label'][
, .(
`Model` = labelAppendix
, `R Code to Specify Subset of Training Data` = filter.training
, `Model Formula` = formula
, `TBC Appendix Page for Same Model` = appendix.page
)
]
, align = c(rep('l', 3), 'c')
) %>% kable_styling(
bootstrap_options = "striped"
, full_width = FALSE
, fixed_thead = TRUE
)
| Model | R Code to Specify Subset of Training Data | Model Formula | TBC Appendix Page for Same Model |
|---|---|---|---|
| Poverty | Race4 == ‘White’ & Sample == ‘XSection’ & EmpSchl == 1 | Pov89 ~ zAFQT89 + zSES + zAge | 620 |
| Poverty (Highschool Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & EmpSchl == 1 & EdSample == ‘HS’ | Pov89 ~ zAFQT89 + zSES + zAge | 620 |
| Dropout | Race4 == ‘White’ & Sample == ‘XSection’ & LTHSvGED != ‘GED’ & !is.na(AllvBA) | LTHSvHS_Ind ~ zAFQT89 + zSES + zAge | 620 |
| Dropout (With Interaction) | Race4 == ‘White’ & Sample == ‘XSection’ & LTHSvGED != ‘GED’ & !is.na(AllvBA) | LTHSvHS_Ind ~ zAFQT89 + zSES + zAge + zAFQT89:zSES | 621 |
| Get GED | Race4 == ‘White’ & Sample == ‘XSection’ & LTHSvGED != ‘LTHS’ & !is.na(AllvBA) | GEDvHSGr_Ind ~ zAFQT89 + zSES + zAge | 622 |
| Get Bachelors | Race4 == ‘White’ & Sample == ‘XSection’ & !(BA_Atta == ‘InSchl/NoD’ & AllvBA == 0 & is.na(AllvBA)) | AllvBA ~ zAFQT89 + zSES + zAge | 622 |
| Out of the Labor Force | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Emp568) & Sex == ‘Male’ | MoOLF89 ~ zAFQT89 + zSES + zAge | 623 |
| Out of the Labor Force (Highschool Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Emp568) & Sex == ‘Male’ & EdSample == ‘HS’ | MoOLF89 ~ zAFQT89 + zSES + zAge | 623 |
| Out of the Labor Force (College Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Emp568) & Sex == ‘Male’ & EdSample == ‘College’ | MoOLF89 ~ zAFQT89 + zSES + zAge | 624 |
| Unemployed | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Emp568) & Sex == ‘Male’ & !is.na(UnempSam) | MoUnemp8 ~ zAFQT89 + zSES + zAge | 624 |
| Unemployed (Highschool Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Emp568) & Sex == ‘Male’ & !is.na(UnempSam) & EdSample == ‘HS’ | MoUnemp8 ~ zAFQT89 + zSES + zAge | 624 |
| Unemployed (College Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Emp568) & Sex == ‘Male’ & !is.na(UnempSam) & EdSample == ‘College’ | MoUnemp8 ~ zAFQT89 + zSES + zAge | 625 |
| Ever Married | Race4 == ‘White’ & Sample == ‘XSection’ & IntAge90 >= 30 & !is.na(EverWed) | WedBy30_Ind ~ zAFQT89 + zSES + zAge | 625 |
| Ever Married (Highschool Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & IntAge90 >= 30 & !is.na(EverWed) & EdSample == ‘HS’ | WedBy30_Ind ~ zAFQT89 + zSES + zAge | 626 |
| Ever Married (College Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & IntAge90 >= 30 & !is.na(EverWed) & EdSample == ‘College’ | WedBy30_Ind ~ zAFQT89 + zSES + zAge | 626 |
| Divorced in 1st 5 Years of Marriage | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Div5Yrs) | Div5Yrs ~ zAFQT89 + zSES + zAge + MarDate | 626 |
| Divorced in 1st 5 Years of Marriage (Highschool Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Div5Yrs) & EdSample == ‘HS’ | Div5Yrs ~ zAFQT89 + zSES + zAge + MarDate | 627 |
| Divorced in 1st 5 Years of Marriage (College Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Div5Yrs) & EdSample == ‘College’ | Div5Yrs ~ zAFQT89 + zSES + zAge + MarDate | 627 |
| Divorced in 1st 5 Years of Marriage (With Parents Factors) | Race4 == ‘White’ & Sample == ‘XSection’ & !is.na(Div5Yrs) & Adult14S != ’’ | Div5Yrs ~ zAFQT89 + zSES + zAge + Adult14S | 627 |
| Surveyed in Jail | Race4 == ‘White’ & Sample == ‘XSection’ & Sex == ‘Male’ | Jail ~ zAFQT89 + zSES + zAge | 645 |
| Surveyed in Jail (Highschool Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & Sex == ‘Male’ & EdSample == ‘HS’ | Jail ~ zAFQT89 + zSES + zAge | 645 |
| “Middle Class Values” | Race4 == ‘White’ & Sample == ‘XSection’ | MCV_Inde ~ zAFQT89 + zSES + zAge | 646 |
| “Middle Class Values” (Highschool Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & EdSample == ‘HS’ | MCV_Inde ~ zAFQT89 + zSES + zAge | 646 |
| “Middle Class Values” (College Subset) | Race4 == ‘White’ & Sample == ‘XSection’ & EdSample == ‘College’ | MCV_Inde ~ zAFQT89 + zSES + zAge | 647 |
kable(
outputs[tbc.models[, .(label, labelAppendix)], on = 'label'][
order(-mcc.optimism.corrected.training)
, .(
Model = labelAppendix
, Category = category
, `Cutoff Probability` = cutoff.mcc.training
, `Lower Bound` = mcc.bootstrap.lower.BCa.training
, `Apparent` = mcc.apparent.training
, `Bootstrapped Mean` = mcc.bootstrap.training
, `Optimism Corrected` = mcc.optimism.corrected.training
, `Upper Bound` = mcc.bootstrap.upper.BCa.training
)
]
, align = rep('c', 8)
, caption = 'Table of Optimal MCC values by TBC Model (Training Data)'
) %>%
kable_styling(
bootstrap_options = "striped"
, full_width = FALSE
, fixed_thead = TRUE
) %>%
add_header_above(c(' ' = 4, 'Optimal MCC' = 3, ' ' = 1))
| Model | Category | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|---|
| Get Bachelors | Education | 0.2665331 | 0.5255255 | 0.5546097 | 0.5551713 | 0.5535824 | 0.5811365 |
| Dropout | Education | 0.2044088 | 0.4533255 | 0.4975504 | 0.4954082 | 0.4961282 | 0.5405721 |
| Dropout (With Interaction) | Education | 0.2184369 | 0.4341434 | 0.4885867 | 0.4939417 | 0.4868035 | 0.5241504 |
| Out of the Labor Force (College Subset) | Employment | 0.2024048 | -0.0232670 | 0.3526263 | 0.3039617 | 0.2963358 | 0.5970251 |
| “Middle Class Values” | “Middle Class Values” | 0.4348697 | 0.2339730 | 0.2710309 | 0.2731777 | 0.2692391 | 0.3033458 |
| Poverty | Poverty | 0.1603206 | 0.1741112 | 0.2310828 | 0.2278948 | 0.2283286 | 0.2829347 |
| Surveyed in Jail (Highschool Subset) | Crime | 0.2244489 | -0.0104750 | 0.3518185 | 0.3445395 | 0.2199638 | 0.5765422 |
| Ever Married (Highschool Subset) | Marriage | 0.7054108 | 0.0942768 | 0.2312680 | 0.1916036 | 0.2092067 | 0.3823478 |
| Poverty (Highschool Subset) | Poverty | 0.0881764 | 0.1327124 | 0.1912472 | 0.1821463 | 0.1831268 | 0.2649095 |
| Get GED | Education | 0.0641283 | 0.1575783 | 0.1825203 | 0.1775072 | 0.1801284 | 0.2153326 |
| Divorced in 1st 5 Years of Marriage (College Subset) | Marriage | 0.0901804 | -0.0555556 | 0.2374482 | 0.2598149 | 0.1755364 | 0.3716570 |
| Surveyed in Jail | Crime | 0.0260521 | 0.1043961 | 0.1638284 | 0.1611350 | 0.1565315 | 0.2063444 |
| “Middle Class Values” (College Subset) | “Middle Class Values” | 0.7374749 | -0.0541101 | 0.1543930 | 0.1641656 | 0.1206757 | 0.2655808 |
| Unemployed | Employment | 0.0981964 | 0.0579804 | 0.1268596 | 0.1159671 | 0.1173604 | 0.2025520 |
| Out of the Labor Force (Highschool Subset) | Employment | 0.0921844 | 0.0571262 | 0.1365957 | 0.1147173 | 0.1127359 | 0.2549812 |
| Divorced in 1st 5 Years of Marriage | Marriage | 0.1903808 | 0.0691469 | 0.1197176 | 0.1241194 | 0.1122616 | 0.1623367 |
| Unemployed (Highschool Subset) | Employment | 0.1803607 | -0.0194870 | 0.1522530 | 0.1113602 | 0.1112092 | 0.3195728 |
| Out of the Labor Force | Employment | 0.1062124 | 0.0387634 | 0.1009407 | 0.1035068 | 0.0905966 | 0.1489757 |
| Divorced in 1st 5 Years of Marriage (With Parents Factors) | Marriage | 0.1943888 | 0.0151819 | 0.0909336 | 0.1080654 | 0.0773499 | 0.1202274 |
| “Middle Class Values” (Highschool Subset) | “Middle Class Values” | 0.5771543 | 0.0140331 | 0.0845855 | 0.0777591 | 0.0657706 | 0.1559252 |
| Ever Married | Marriage | 0.8116232 | 0.0147045 | 0.0770474 | 0.0735238 | 0.0653208 | 0.1256629 |
| Divorced in 1st 5 Years of Marriage (Highschool Subset) | Marriage | 0.2004008 | -0.1043035 | 0.0448236 | 0.0829293 | 0.0161811 | 0.0802507 |
| Ever Married (College Subset) | Marriage | 0.6693387 | -0.2084305 | -0.0098520 | 0.1014006 | -0.0863671 | 0.0447238 |
| Unemployed (College Subset) | Employment | 0.2625251 | -0.0659163 | -0.0149435 | 0.2189325 | -0.1023747 | 0.2766287 |
kable(
outputs[tbc.models[, .(label, labelAppendix)], on = 'label'][
order(-mcc.optimism.corrected.holdout)
, .(
Model = labelAppendix
, Category = category
, `Cutoff Probability` = cutoff.mcc.holdout
, `Lower Bound` = mcc.bootstrap.lower.bootPercentile.holdout
, `Apparent` = mcc.apparent.holdout
, `Bootstrapped Mean` = mcc.bootstrap.holdout
, `Optimism Corrected` = mcc.optimism.corrected.holdout
, `Upper Bound` = mcc.bootstrap.upper.bootPercentile.holdout
)
]
, align = rep('c', 8)
, caption = 'Table of Optimal MCC values by TBC Model (Holdout Data)'
) %>%
kable_styling(
bootstrap_options = "striped"
, full_width = FALSE
, fixed_thead = TRUE
) %>%
add_header_above(c(' ' = 4, 'Optimal MCC' = 3, ' ' = 1))
| Model | Category | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|---|
| Get Bachelors | Education | 0.1503006 | 0.5964385 | 0.6521115 | 0.6459635 | 0.6525238 | 0.6939160 |
| Dropout | Education | 0.1643287 | 0.5530523 | 0.6050726 | 0.6014355 | 0.6050119 | 0.6486212 |
| Dropout (With Interaction) | Education | 0.1983968 | 0.5446363 | 0.6007592 | 0.5952430 | 0.6005778 | 0.6445841 |
| Unemployed (College Subset) | Employment | 0.2084168 | -0.0772503 | 0.4803922 | 0.2460685 | 0.4547918 | 0.8573214 |
| “Middle Class Values” | “Middle Class Values” | 0.4709419 | 0.3932675 | 0.4511110 | 0.4506696 | 0.4512280 | 0.5077860 |
| Poverty | Poverty | 0.1683367 | 0.2601890 | 0.3483250 | 0.3382416 | 0.3477546 | 0.4139740 |
| Poverty (Highschool Subset) | Poverty | 0.1863727 | 0.1285619 | 0.3355610 | 0.3117212 | 0.3369449 | 0.4782756 |
| Get GED | Education | 0.1162325 | 0.2220604 | 0.2904177 | 0.2858693 | 0.2902919 | 0.3481317 |
| Surveyed in Jail | Crime | 0.0380762 | 0.1237940 | 0.2362658 | 0.2136281 | 0.2365009 | 0.2980373 |
| Divorced in 1st 5 Years of Marriage (College Subset) | Marriage | 0.0721443 | -0.1124844 | 0.2285707 | 0.1731752 | 0.2320964 | 0.4331122 |
| Out of the Labor Force (Highschool Subset) | Employment | 0.1302605 | -0.0959208 | 0.2190890 | 0.0852661 | 0.2158204 | 0.3744309 |
| Out of the Labor Force | Employment | 0.1042084 | -0.0096561 | 0.1715436 | 0.1453647 | 0.1719140 | 0.2652515 |
| Unemployed | Employment | 0.0821643 | 0.0489113 | 0.1684439 | 0.1672231 | 0.1687170 | 0.2753726 |
| Divorced in 1st 5 Years of Marriage | Marriage | 0.1543086 | 0.0550943 | 0.1618226 | 0.1380496 | 0.1615906 | 0.2112502 |
| “Middle Class Values” (Highschool Subset) | “Middle Class Values” | 0.5430862 | -0.1051121 | 0.1361782 | 0.0346021 | 0.1350699 | 0.1646227 |
| Divorced in 1st 5 Years of Marriage (With Parents Factors) | Marriage | 0.1623246 | 0.0213149 | 0.1331039 | 0.1118409 | 0.1332435 | 0.1907022 |
| Divorced in 1st 5 Years of Marriage (Highschool Subset) | Marriage | 0.1523046 | -0.0941931 | 0.1155382 | 0.0713586 | 0.1158789 | 0.2201537 |
| Surveyed in Jail (Highschool Subset) | Crime | 0.0040080 | -0.1516913 | 0.0854826 | 0.0422934 | 0.0852632 | 0.1890059 |
| Ever Married | Marriage | 0.7695391 | -0.0430202 | 0.0630608 | 0.0466488 | 0.0623341 | 0.1385667 |
| Ever Married (Highschool Subset) | Marriage | 0.9438878 | -0.1825518 | 0.0336324 | 0.0312788 | 0.0341430 | 0.1174501 |
| Out of the Labor Force (College Subset) | Employment | 0.0020040 | -0.2745752 | 0.0309344 | 0.0248098 | 0.0290959 | 0.1792303 |
| Unemployed (Highschool Subset) | Employment | 0.1402806 | -0.0898027 | -0.0299510 | 0.0978638 | -0.0331602 | 0.4767900 |
| Ever Married (College Subset) | Marriage | 0.6432866 | -0.1408794 | -0.0482243 | 0.0646447 | -0.0496034 | 0.3081470 |
| “Middle Class Values” (College Subset) | “Middle Class Values” | 0.6773547 | -0.1091326 | -0.0511336 | 0.0724688 | -0.0597152 | 0.3261558 |
cat(paste0('### ', tbc.models[label == 'poverty', labelAppendix]))
cat('#### Model Parameters')
model <- 'poverty'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| Pov89 | |
| zAFQT89 | -0.83767*** |
| (0.09351) | |
| zSES | -0.33008*** |
| (0.09010) | |
| zAge | -0.02384 |
| (0.07237) | |
| Constant | -2.64877*** |
| (0.07688) | |
| Observations | 3,367 |
| Log Likelihood | -784.40180 |
| Akaike Inf. Crit. | 1,576.80400 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'poverty'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'poverty'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'poverty'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'poverty'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'poverty'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'poverty'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'poverty'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.1603206 | 0.1741112 | 0.2310828 | 0.2278948 | 0.2283286 | 0.2829347 |
| Holdout | 0.1683367 | 0.2601890 | 0.3483250 | 0.3382416 | 0.3477546 | 0.4139740 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'poverty.hs', labelAppendix]))
cat('#### Model Parameters')
model <- 'poverty.hs'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| Pov89 | |
| zAFQT89 | -0.82673*** |
| (0.16273) | |
| zSES | -0.36197** |
| (0.14998) | |
| zAge | 0.10492 |
| (0.10946) | |
| Constant | -2.72378*** |
| (0.12902) | |
| Observations | 1,236 |
| Log Likelihood | -325.26940 |
| Akaike Inf. Crit. | 658.53880 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'poverty.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'poverty.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'poverty.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'poverty.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'poverty.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'poverty.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'poverty.hs'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.0881764 | 0.1327124 | 0.1912472 | 0.1821463 | 0.1831268 | 0.2649095 |
| Holdout | 0.1863727 | 0.1285619 | 0.3355610 | 0.3117212 | 0.3369449 | 0.4782756 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'dropout', labelAppendix]))
cat('#### Model Parameters')
model <- 'dropout'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| LTHSvHS | |
| zAFQT89 | -1.72296*** |
| (0.10281) | |
| zSES | -0.64776*** |
| (0.08966) | |
| zAge | 0.05696 |
| (0.06883) | |
| Constant | -2.85323*** |
| (0.09395) | |
| Observations | 3,572 |
| Log Likelihood | -779.99040 |
| Akaike Inf. Crit. | 1,567.98100 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'dropout'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'dropout'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'dropout'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'dropout'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'dropout'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'dropout'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'dropout'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.2044088 | 0.4533255 | 0.4975504 | 0.4954082 | 0.4961282 | 0.5405721 |
| Holdout | 0.1643287 | 0.5530523 | 0.6050726 | 0.6014355 | 0.6050119 | 0.6486212 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'dropout_interaction', labelAppendix]))
cat('#### Model Parameters')
model <- 'dropout_interaction'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| LTHSvHS | |
| zAFQT89 | -1.89377*** |
| (0.11884) | |
| zSES | -0.94024*** |
| (0.12505) | |
| zAge | 0.05227 |
| (0.06827) | |
| zAFQT89:zSES | -0.41332*** |
| (0.11878) | |
| Constant | -2.91432*** |
| (0.10293) | |
| Observations | 3,572 |
| Log Likelihood | -773.90060 |
| Akaike Inf. Crit. | 1,557.80100 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'dropout_interaction'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'dropout_interaction'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'dropout_interaction'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'dropout_interaction'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'dropout_interaction'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'dropout_interaction'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'dropout_interaction'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.2184369 | 0.4341434 | 0.4885867 | 0.4939417 | 0.4868035 | 0.5241504 |
| Holdout | 0.1983968 | 0.5446363 | 0.6007592 | 0.5952430 | 0.6005778 | 0.6445841 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'get_ged', labelAppendix]))
cat('#### Model Parameters')
model <- 'get_ged'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| GEDvHSGr | |
| zAFQT89 | -0.43253*** |
| (0.08512) | |
| zSES | -0.60822*** |
| (0.08375) | |
| zAge | -0.04164 |
| (0.06624) | |
| Constant | -2.35485*** |
| (0.06539) | |
| Observations | 3,494 |
| Log Likelihood | -915.28150 |
| Akaike Inf. Crit. | 1,838.56300 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'get_ged'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'get_ged'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'get_ged'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'get_ged'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'get_ged'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'get_ged'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'get_ged'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.0641283 | 0.1575783 | 0.1825203 | 0.1775072 | 0.1801284 | 0.2153326 |
| Holdout | 0.1162325 | 0.2220604 | 0.2904177 | 0.2858693 | 0.2902919 | 0.3481317 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'get_bachelors', labelAppendix]))
cat('#### Model Parameters')
model <- 'get_bachelors'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| AllvBA | |
| zAFQT89 | 1.77473*** |
| (0.07825) | |
| zSES | 1.01514*** |
| (0.06784) | |
| zAge | -0.30316*** |
| (0.05113) | |
| Constant | -2.41463*** |
| (0.07783) | |
| Observations | 3,857 |
| Log Likelihood | -1,389.47300 |
| Akaike Inf. Crit. | 2,786.94500 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'get_bachelors'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'get_bachelors'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'get_bachelors'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'get_bachelors'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'get_bachelors'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'get_bachelors'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'get_bachelors'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.2665331 | 0.5255255 | 0.5546097 | 0.5551713 | 0.5535824 | 0.5811365 |
| Holdout | 0.1503006 | 0.5964385 | 0.6521115 | 0.6459635 | 0.6525238 | 0.6939160 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'oolf4wks', labelAppendix]))
cat('#### Model Parameters')
model <- 'oolf4wks'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| MoOLF89 | |
| zAFQT89 | -0.36247*** |
| (0.09928) | |
| zSES | 0.21788** |
| (0.10757) | |
| zAge | -0.12815 |
| (0.08640) | |
| Constant | -2.20264*** |
| (0.08680) | |
| Observations | 1,686 |
| Log Likelihood | -548.25140 |
| Akaike Inf. Crit. | 1,104.50300 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'oolf4wks'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'oolf4wks'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'oolf4wks'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'oolf4wks'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'oolf4wks'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'oolf4wks'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'oolf4wks'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.1062124 | 0.0387634 | 0.1009407 | 0.1035068 | 0.0905966 | 0.1489757 |
| Holdout | 0.1042084 | -0.0096561 | 0.1715436 | 0.1453647 | 0.1719140 | 0.2652515 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'oolf4wks.hs', labelAppendix]))
cat('#### Model Parameters')
model <- 'oolf4wks.hs'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| MoOLF89 | |
| zAFQT89 | -0.42151* |
| (0.22643) | |
| zSES | 0.56490** |
| (0.23004) | |
| zAge | -0.14557 |
| (0.16725) | |
| Constant | -2.69780*** |
| (0.17674) | |
| Observations | 621 |
| Log Likelihood | -156.98050 |
| Akaike Inf. Crit. | 321.96090 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'oolf4wks.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'oolf4wks.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'oolf4wks.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'oolf4wks.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'oolf4wks.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'oolf4wks.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'oolf4wks.hs'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.0921844 | 0.0571262 | 0.1365957 | 0.1147173 | 0.1127359 | 0.2549812 |
| Holdout | 0.1302605 | -0.0959208 | 0.2190890 | 0.0852661 | 0.2158204 | 0.3744309 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'oolf4wks.col', labelAppendix]))
cat('#### Model Parameters')
model <- 'oolf4wks.col'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| MoOLF89 | |
| zAFQT89 | -0.84324* |
| (0.45267) | |
| zSES | 0.94515** |
| (0.38754) | |
| zAge | -0.46062 |
| (0.29904) | |
| Constant | -3.12957*** |
| (0.60817) | |
| Observations | 268 |
| Log Likelihood | -56.53686 |
| Akaike Inf. Crit. | 121.07370 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'oolf4wks.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'oolf4wks.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'oolf4wks.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'oolf4wks.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'oolf4wks.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'oolf4wks.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'oolf4wks.col'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.2024048 | -0.0232670 | 0.3526263 | 0.3039617 | 0.2963358 | 0.5970251 |
| Holdout | 0.0020040 | -0.2745752 | 0.0309344 | 0.0248098 | 0.0290959 | 0.1792303 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'unemployed4wks', labelAppendix]))
cat('#### Model Parameters')
model <- 'unemployed4wks'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| MoUnemp8 | |
| zAFQT89 | -0.49487*** |
| (0.12989) | |
| zSES | -0.02535 |
| (0.13838) | |
| zAge | -0.02181 |
| (0.11084) | |
| Constant | -2.53577*** |
| (0.10760) | |
| Observations | 1,397 |
| Log Likelihood | -348.71510 |
| Akaike Inf. Crit. | 705.43020 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'unemployed4wks'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'unemployed4wks'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'unemployed4wks'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'unemployed4wks'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'unemployed4wks'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'unemployed4wks'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'unemployed4wks'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.0981964 | 0.0579804 | 0.1268596 | 0.1159671 | 0.1173604 | 0.2025520 |
| Holdout | 0.0821643 | 0.0489113 | 0.1684439 | 0.1672231 | 0.1687170 | 0.2753726 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'unemployed4wks.hs', labelAppendix]))
cat('#### Model Parameters')
model <- 'unemployed4wks.hs'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| MoUnemp8 | |
| zAFQT89 | -0.39353* |
| (0.23687) | |
| zSES | 0.13952 |
| (0.23532) | |
| zAge | -0.10511 |
| (0.17625) | |
| Constant | -2.59878*** |
| (0.17661) | |
| Observations | 537 |
| Log Likelihood | -140.49120 |
| Akaike Inf. Crit. | 288.98240 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'unemployed4wks.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'unemployed4wks.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'unemployed4wks.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'unemployed4wks.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'unemployed4wks.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'unemployed4wks.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'unemployed4wks.hs'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.1803607 | -0.0194870 | 0.152253 | 0.1113602 | 0.1112092 | 0.3195728 |
| Holdout | 0.1402806 | -0.0898027 | -0.029951 | 0.0978638 | -0.0331602 | 0.4767900 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'unemployed4wks.col', labelAppendix]))
cat('#### Model Parameters')
model <- 'unemployed4wks.col'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| MoUnemp8 | |
| zAFQT89 | -0.91969 |
| (0.56414) | |
| zSES | 1.00393** |
| (0.50155) | |
| zAge | 0.29420 |
| (0.33111) | |
| Constant | -3.16869*** |
| (0.72765) | |
| Observations | 228 |
| Log Likelihood | -40.50613 |
| Akaike Inf. Crit. | 89.01227 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'unemployed4wks.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'unemployed4wks.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'unemployed4wks.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'unemployed4wks.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'unemployed4wks.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'unemployed4wks.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'unemployed4wks.col'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.2625251 | -0.0659163 | -0.0149435 | 0.2189325 | -0.1023747 | 0.2766287 |
| Holdout | 0.2084168 | -0.0772503 | 0.4803922 | 0.2460685 | 0.4547918 | 0.8573214 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'ever_married30', labelAppendix]))
cat('#### Model Parameters')
model <- 'ever_married30'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| WedBy30 | |
| zAFQT89 | -0.04736 |
| (0.07579) | |
| zSES | -0.19055** |
| (0.07863) | |
| zAge | 0.20403 |
| (0.12905) | |
| Constant | 1.19841*** |
| (0.12890) | |
| Observations | 1,634 |
| Log Likelihood | -839.76750 |
| Akaike Inf. Crit. | 1,687.53500 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'ever_married30'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'ever_married30'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'ever_married30'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'ever_married30'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'ever_married30'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'ever_married30'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'ever_married30'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.8116232 | 0.0147045 | 0.0770474 | 0.0735238 | 0.0653208 | 0.1256629 |
| Holdout | 0.7695391 | -0.0430202 | 0.0630608 | 0.0466488 | 0.0623341 | 0.1385667 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'ever_married30.hs', labelAppendix]))
cat('#### Model Parameters')
model <- 'ever_married30.hs'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| WedBy30 | |
| zAFQT89 | 0.51425*** |
| (0.15984) | |
| zSES | -0.11288 |
| (0.15828) | |
| zAge | 0.36827 |
| (0.24225) | |
| Constant | 1.41495*** |
| (0.23427) | |
| Observations | 605 |
| Log Likelihood | -259.40300 |
| Akaike Inf. Crit. | 526.80590 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'ever_married30.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'ever_married30.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'ever_married30.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'ever_married30.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'ever_married30.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'ever_married30.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'ever_married30.hs'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.7054108 | 0.0942768 | 0.2312680 | 0.1916036 | 0.2092067 | 0.3823478 |
| Holdout | 0.9438878 | -0.1825518 | 0.0336324 | 0.0312788 | 0.0341430 | 0.1174501 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'ever_married30.col', labelAppendix]))
cat('#### Model Parameters')
model <- 'ever_married30.col'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| WedBy30 | |
| zAFQT89 | 0.05014 |
| (0.22375) | |
| zSES | 0.09683 |
| (0.18337) | |
| zAge | -0.01778 |
| (0.29509) | |
| Constant | 0.71372* |
| (0.39462) | |
| Observations | 237 |
| Log Likelihood | -145.35750 |
| Akaike Inf. Crit. | 298.71500 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'ever_married30.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'ever_married30.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'ever_married30.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'ever_married30.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'ever_married30.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'ever_married30.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'ever_married30.col'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.6693387 | -0.2084305 | -0.0098520 | 0.1014006 | -0.0863671 | 0.0447238 |
| Holdout | 0.6432866 | -0.1408794 | -0.0482243 | 0.0646447 | -0.0496034 | 0.3081470 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'divorced_in5yrs', labelAppendix]))
cat('#### Model Parameters')
model <- 'divorced_in5yrs'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| Div5Yrs | |
| zAFQT89 | -0.35734*** |
| (0.07813) | |
| zSES | 0.22195*** |
| (0.07876) | |
| zAge | -0.17767** |
| (0.07415) | |
| MarDate | -0.08677*** |
| (0.02431) | |
| Constant | 5.70861*** |
| (1.98581) | |
| Observations | 2,031 |
| Log Likelihood | -991.37190 |
| Akaike Inf. Crit. | 1,992.74400 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'divorced_in5yrs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'divorced_in5yrs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'divorced_in5yrs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'divorced_in5yrs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'divorced_in5yrs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'divorced_in5yrs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'divorced_in5yrs'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.1903808 | 0.0691469 | 0.1197176 | 0.1241194 | 0.1122616 | 0.1623367 |
| Holdout | 0.1543086 | 0.0550943 | 0.1618226 | 0.1380496 | 0.1615906 | 0.2112502 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'divorced_in5yrs.hs', labelAppendix]))
cat('#### Model Parameters')
model <- 'divorced_in5yrs.hs'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| Div5Yrs | |
| zAFQT89 | -0.03792 |
| (0.13481) | |
| zSES | 0.22069* |
| (0.12882) | |
| zAge | -0.10781 |
| (0.11468) | |
| MarDate | -0.08400** |
| (0.03832) | |
| Constant | 5.44514* |
| (3.12869) | |
| Observations | 875 |
| Log Likelihood | -428.70640 |
| Akaike Inf. Crit. | 867.41290 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'divorced_in5yrs.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'divorced_in5yrs.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'divorced_in5yrs.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'divorced_in5yrs.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'divorced_in5yrs.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'divorced_in5yrs.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'divorced_in5yrs.hs'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.2004008 | -0.1043035 | 0.0448236 | 0.0829293 | 0.0161811 | 0.0802507 |
| Holdout | 0.1523046 | -0.0941931 | 0.1155382 | 0.0713586 | 0.1158789 | 0.2201537 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'divorced_in5yrs.col', labelAppendix]))
cat('#### Model Parameters')
model <- 'divorced_in5yrs.col'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| Div5Yrs | |
| zAFQT89 | -0.75623* |
| (0.45023) | |
| zSES | -0.07353 |
| (0.35889) | |
| zAge | -0.55879 |
| (0.40470) | |
| MarDate | -0.41115** |
| (0.16298) | |
| Constant | 32.39423** |
| (13.50911) | |
| Observations | 209 |
| Log Likelihood | -48.41447 |
| Akaike Inf. Crit. | 106.82890 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'divorced_in5yrs.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'divorced_in5yrs.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'divorced_in5yrs.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'divorced_in5yrs.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'divorced_in5yrs.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'divorced_in5yrs.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'divorced_in5yrs.col'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.0901804 | -0.0555556 | 0.2374482 | 0.2598149 | 0.1755364 | 0.3716570 |
| Holdout | 0.0721443 | -0.1124844 | 0.2285707 | 0.1731752 | 0.2320964 | 0.4331122 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'divorced_in5yrs_parents', labelAppendix]))
cat('#### Model Parameters')
model <- 'divorced_in5yrs_parents'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| Div5Yrs | |
| zAFQT89 | -0.39256*** |
| (0.07742) | |
| zSES | 0.19104** |
| (0.07833) | |
| zAge | -0.02781 |
| (0.06177) | |
| Adult14S2 Bio | -0.31954 |
| (0.20437) | |
| Adult14SBio/Step | -0.11008 |
| (0.25682) | |
| Adult14SOther | -0.25948 |
| (0.37893) | |
| Constant | -1.12299*** |
| (0.19200) | |
| Observations | 2,029 |
| Log Likelihood | -994.67710 |
| Akaike Inf. Crit. | 2,003.35400 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'divorced_in5yrs_parents'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'divorced_in5yrs_parents'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'divorced_in5yrs_parents'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'divorced_in5yrs_parents'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'divorced_in5yrs_parents'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'divorced_in5yrs_parents'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'divorced_in5yrs_parents'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.1943888 | 0.0151819 | 0.0909336 | 0.1080654 | 0.0773499 | 0.1202274 |
| Holdout | 0.1623246 | 0.0213149 | 0.1331039 | 0.1118409 | 0.1332435 | 0.1907022 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'surveyed_in_jail', labelAppendix]))
cat('#### Model Parameters')
model <- 'surveyed_in_jail'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| Jail | |
| zAFQT89 | -0.89670*** |
| (0.17536) | |
| zSES | -0.15554 |
| (0.18062) | |
| zAge | 0.07830 |
| (0.14687) | |
| Constant | -3.77722*** |
| (0.17180) | |
| Observations | 1,945 |
| Log Likelihood | -219.90130 |
| Akaike Inf. Crit. | 447.80250 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'surveyed_in_jail'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'surveyed_in_jail'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'surveyed_in_jail'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'surveyed_in_jail'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'surveyed_in_jail'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'surveyed_in_jail'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'surveyed_in_jail'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.0260521 | 0.1043961 | 0.1638284 | 0.1611350 | 0.1565315 | 0.2063444 |
| Holdout | 0.0380762 | 0.1237940 | 0.2362658 | 0.2136281 | 0.2365009 | 0.2980373 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'surveyed_in_jail.hs', labelAppendix]))
cat('#### Model Parameters')
model <- 'surveyed_in_jail.hs'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| Jail | |
| zAFQT89 | -1.07007** |
| (0.44312) | |
| zSES | -0.16212 |
| (0.46430) | |
| zAge | 0.46727 |
| (0.36775) | |
| Constant | -4.96579*** |
| (0.48063) | |
| Observations | 716 |
| Log Likelihood | -39.85058 |
| Akaike Inf. Crit. | 87.70117 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'surveyed_in_jail.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'surveyed_in_jail.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'surveyed_in_jail.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'surveyed_in_jail.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'surveyed_in_jail.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'surveyed_in_jail.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'surveyed_in_jail.hs'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.2244489 | -0.0104750 | 0.3518185 | 0.3445395 | 0.2199638 | 0.5765422 |
| Holdout | 0.0040080 | -0.1516913 | 0.0854826 | 0.0422934 | 0.0852632 | 0.1890059 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'middle_class_values', labelAppendix]))
cat('#### Model Parameters')
model <- 'middle_class_values'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| MCV | |
| zAFQT89 | 0.63251*** |
| (0.05282) | |
| zSES | 0.24496*** |
| (0.05206) | |
| zAge | 0.00664 |
| (0.04019) | |
| Constant | -0.06385 |
| (0.03893) | |
| Observations | 3,029 |
| Log Likelihood | -1,937.43300 |
| Akaike Inf. Crit. | 3,882.86600 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'middle_class_values'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'middle_class_values'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'middle_class_values'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'middle_class_values'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'middle_class_values'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'middle_class_values'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'middle_class_values'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.4348697 | 0.2339730 | 0.2710309 | 0.2731777 | 0.2692391 | 0.3033458 |
| Holdout | 0.4709419 | 0.3932675 | 0.4511110 | 0.4506696 | 0.4512280 | 0.5077860 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'middle_class_values.hs', labelAppendix]))
cat('#### Model Parameters')
model <- 'middle_class_values.hs'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| MCV | |
| zAFQT89 | 0.16815* |
| (0.09312) | |
| zSES | -0.17993** |
| (0.09034) | |
| zAge | 0.01888 |
| (0.06218) | |
| Constant | 0.39448*** |
| (0.06118) | |
| Observations | 1,162 |
| Log Likelihood | -781.85690 |
| Akaike Inf. Crit. | 1,571.71400 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'middle_class_values.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'middle_class_values.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'middle_class_values.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'middle_class_values.hs'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'middle_class_values.hs'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'middle_class_values.hs'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'middle_class_values.hs'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.5771543 | 0.0140331 | 0.0845855 | 0.0777591 | 0.0657706 | 0.1559252 |
| Holdout | 0.5430862 | -0.1051121 | 0.1361782 | 0.0346021 | 0.1350699 | 0.1646227 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
cat(paste0('### ', tbc.models[label == 'middle_class_values.col', labelAppendix]))
cat('#### Model Parameters')
model <- 'middle_class_values.col'
title.model <- paste0(
'Reproduced TBC Probability Model: '
, tbc.models[label == model, labelAppendix]
)
target.model <- tbc.models[label == model, response]
stargazer(
outputs[label == model, glm.model][[1]]
, type = 'html'
, title = title.model
, dep.var.labels = target.model
, digits = 5
)
| Dependent variable: | |
| MCV | |
| zAFQT89 | 0.39251** |
| (0.19881) | |
| zSES | 0.03692 |
| (0.16858) | |
| zAge | 0.13876 |
| (0.13364) | |
| Constant | 0.99516*** |
| (0.23868) | |
| Observations | 402 |
| Log Likelihood | -200.09150 |
| Akaike Inf. Crit. | 408.18290 |
| Note: | p<0.1; p<0.05; p<0.01 |
cat('#### Plotting Modeled Probabilities (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'middle_class_values.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
model <- 'middle_class_values.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Training Data)
cat('##### zAge')
model <- 'middle_class_values.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Probabilities (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
model <- 'middle_class_values.col'
factorSelection <- 'zAFQT89'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
model <- 'middle_class_values.col'
factorSelection <- 'zSES'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
model <- 'middle_class_values.col'
factorSelection <- 'zAge'
plotFrequencyVsProbability.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = inputs[[model]][['model parameters']][, response]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = inputs[[model]][['model parameters']][, target.descriptions.formatted]
)
Plot of observed vs bootstrapped modeled probabilities across zAge (Holdout Data)
cat('#### Plotting MCC Distributions{.tabset .tabset-fade}')
model <- 'middle_class_values.col'
kable(
data.table(
Dataset = c('Training', 'Holdout')
, `Cutoff Probability` = c(
outputs[label == (model), cutoff.mcc.training][[1]]
, outputs[label == (model), cutoff.mcc.holdout][[1]]
)
, `Lower Bound` = c(
outputs[label == (model), mcc.bootstrap.lower.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.lower.bootPercentile.holdout][[1]]
)
, `Apparent` = c(
outputs[label == (model), mcc.apparent.training][[1]]
, outputs[label == (model), mcc.apparent.holdout][[1]]
)
, `Bootstrapped Mean` = c(
outputs[label == (model), mcc.bootstrap.training][[1]]
, outputs[label == (model), mcc.bootstrap.holdout][[1]]
)
, `Optimism Corrected` = c(
outputs[label == (model), mcc.optimism.corrected.training][[1]]
, outputs[label == (model), mcc.optimism.corrected.holdout][[1]]
)
, `Upper Bound` = c(
outputs[label == (model), mcc.bootstrap.upper.BCa.training][[1]]
, outputs[label == (model), mcc.bootstrap.upper.bootPercentile.holdout][[1]]
)
)
, align = rep('c', 7)
) %>%
kable_styling(bootstrap_options = "striped", full_width = F) %>%
add_header_above(c(' ' = 3, 'Optimal MCC' = 3, ' ' = 1))
| Dataset | Cutoff Probability | Lower Bound | Apparent | Bootstrapped Mean | Optimism Corrected | Upper Bound |
|---|---|---|---|---|---|---|
| Training | 0.7374749 | -0.0541101 | 0.1543930 | 0.1641656 | 0.1206757 | 0.2655808 |
| Holdout | 0.6773547 | -0.1091326 | -0.0511336 | 0.0724688 | -0.0597152 | 0.3261558 |
cat('##### Training')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.training[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Training Data)
cat('##### Holdout')
plotPerformanceMetric.cutoff.bootCI(
DT.bootCI = DT.mcc.bootCI.holdout[label == model]
, performanceMetric = 'mcc'
, performanceMetric.boot = 'mcc.bootstrap'
, cutoff.name = 'cutoff'
, lowerCI.name = 'lower'
, upperCI.name = 'upper'
)
Plot of bootstrapped MCC distributions by cutoff (Holdout Data)
cat('#### Plotting Modeled Classifications (Training Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Training Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Training Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.training']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.training[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.training[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Training Data)
cat('#### Plotting Modeled Classifications (Holdout Data){.tabset .tabset-fade}')
cat('##### zAFQT89')
factorSelection <- 'zAFQT89'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAFQT89 (Holdout Data)
cat('##### zSES')
factorSelection <- 'zSES'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zSES (Holdout Data)
cat('##### zAge')
factorSelection <- 'zAge'
plotFrequencyVsClassification.bootCI(
DT.factors = inputs[[model]][['DT.model.holdout']]
, DT.classification.bootCI = DTs.mcc.classification.bootCI.holdout[[model]][[factorSelection]]
, DT.probabilities.bootCI = DTs.probability.bootCI.holdout[[model]][[factorSelection]]
, xString = factorSelection
, yString = outputs[label == model, response]
, cutoff = outputs[label == model, cutoff.mcc.training]
, key = 'ID'
, bandingPrecision = 10
, alpha = 0.05
, response.label = outputs[label == model, target.descriptions.formatted]
)
Plot comparing bootstrapped classifications and modeled probabilities across zAge (Holdout Data)
DT.compare.bootCI <- outputs[
, .(
label = factor(label
, levels = rev(levels(label))
)
, mcc.bootstrap.lower.BCa.training, mcc.apparent.training, mcc.bootstrap.upper.BCa.training
, lower.mcc.bootPackage, optimal.mcc.bootPackage, upper.mcc.bootPackage
, lb.diff = mcc.bootstrap.lower.BCa.training - lower.mcc.bootPackage
, ub.diff = mcc.bootstrap.upper.BCa.training - upper.mcc.bootPackage
)
][, maxAbs.diff := pmax(abs(lb.diff), abs(ub.diff))]
DT.compare.bootCI.melt <- melt(
DT.compare.bootCI
, id.vars = c('label', 'lb.diff', 'ub.diff', 'maxAbs.diff')
, measure.vars = patterns('lower','(apparent|optimal)','upper')
, variable.name = 'implementation'
, value.name = c('lower', 'estimate', 'upper')
)
DT.compare.bootCI.melt[, label := factor(label, levels = DT.compare.bootCI[order(-maxAbs.diff), label], ordered = TRUE)]
implementation.names <- c('Custom BCa for data.table', 'boot package')
DT.compare.bootCI.melt[
, implementation := factor(
implementation.names[c(rep(1,nrow(DT.compare.bootCI)), rep(2,nrow(DT.compare.bootCI)))]
, levels = implementation.names
)
]
One way to test whether the results of this analysis are reasonable is to reproduce part of it using the boot package, which facilitates bootstrapping and can calculate BCa intervals. In the script used to store the project’s results (i.e., TBC_Bootstrap.R), the boot::boot function performs the following steps:
Fits a probability model to a bootstrapped version of each model’s training data
Creates a classification using the latter model’s fitted values and the optimal cutoff value calculated from the custom bootstrap functions used in this project
Calculates the MCC value for that bootstrapped model
boot package. The MCC estimates are identical and the BCa intervals tend to agree very well. The model labels are ordered by the largest difference in BCa bounds. E.g., if the difference in lower BCa bounds for the Get GED model is 3.866694610^{-5}, and for the upper BCa bounds 4.720284610^{-4}, then the largest difference in BCa bounds for this model is 4.720284610^{-4}. The model with the most extreme difference in BCa bounds is Unemployed (Highschool Subset) with a value of 0.0393696.ggplot(
DT.compare.bootCI.melt
, aes(
x = label
, ymin = lower
, y = estimate
, ymax = upper
, color = implementation
, group = implementation
)
) +
geom_point() +
geom_errorbar(alpha = 0.5) +
theme_bw() +
coord_flip()
Plot comparing bootstrapped MCC estimates and confidence intervals measured by this analysis and the boot package
kable(
DT.compare.bootCI[tbc.models[, .(label, labelAppendix)], on = 'label'][order(maxAbs.diff)][
, .(
`Model` = labelAppendix
, `Largest Difference in BCa bounds` = maxAbs.diff
)
]
, align = c('l', 'c')
, caption = 'Table comparing bootstrapped MCC estimates and confidence intervals measured by this analysis and the boot package'
) %>% kable_styling(
bootstrap_options = "striped"
, full_width = FALSE
, fixed_thead = TRUE
)
| Model | Largest Difference in BCa bounds |
|---|---|
| Get Bachelors | 0.0004678 |
| Get GED | 0.0004720 |
| Out of the Labor Force | 0.0006093 |
| “Middle Class Values” (College Subset) | 0.0007759 |
| Dropout | 0.0011253 |
| Ever Married | 0.0011459 |
| Poverty (Highschool Subset) | 0.0013201 |
| “Middle Class Values” | 0.0019633 |
| Unemployed | 0.0022801 |
| Divorced in 1st 5 Years of Marriage | 0.0022979 |
| “Middle Class Values” (Highschool Subset) | 0.0023998 |
| Out of the Labor Force (Highschool Subset) | 0.0030471 |
| Dropout (With Interaction) | 0.0035205 |
| Ever Married (Highschool Subset) | 0.0036973 |
| Poverty | 0.0040212 |
| Divorced in 1st 5 Years of Marriage (With Parents Factors) | 0.0050280 |
| Surveyed in Jail | 0.0066849 |
| Divorced in 1st 5 Years of Marriage (College Subset) | 0.0177187 |
| Unemployed (College Subset) | 0.0181666 |
| Out of the Labor Force (College Subset) | 0.0227026 |
| Divorced in 1st 5 Years of Marriage (Highschool Subset) | 0.0301332 |
| Surveyed in Jail (Highschool Subset) | 0.0336793 |
| Ever Married (College Subset) | 0.0377535 |
| Unemployed (Highschool Subset) | 0.0393696 |
Much more can be said about these models, but that would lengthen this already long report!
Future work could involve the remaining ~12 TBC models, but I don’t expect to have time to do so.
I’m aware bootstrap percentile confidence intervals may have issues. However, I’ve included them for holdout data as a soft comparison with the results from the training data. The holdout already has the issue of being much thinner relative to the training data for each model.
The TBC_Bootstrap procedure technically doesn’t set a random seed for the boot package validation work. In a future version, I could do this and rerun the procedure. However, I don’t think it would change any conclusions of this report.
I chose 10,000 bootstrap iterations because it seemed excessive. Prelim work with the poverty model (not included here) suggested that even 5,000 iterations was adequate. I wanted enough iterations that a reader wouldn’t discredit the conclusions for having too few bootstrap iterations. I guess there could be a section about the convergence of the posted values, but that would take more development time than I have.
For completeness, it would be nice to explain BCa intervals, but I think that get’s too detailed and distracts from the rest of this report.
I chose not to use the boot package for the main analysis because I wanted to better understand bootstrapping. Also, I couldn’t get the boot function to work in parallel (on Windows 10), which greatly extended the overall runtime!
The suggested table for interpreting correlation values is only used as a common reference for myself and the reader. Interpretation is always subjective.
It’s odd that some of the MCC distributions don’t have local optima. E.g., see the plot for Unemployed (Highschool Subset). Inspecting the data and code, I still can’t explain it.
I included bootstrap optimism because I wanted to account for sampling bias, especially for models having thin training data. I also use it to give the results a bit more credibility by estimating and removing (a type of) bias. Looking at the Table of Optimal MCC Values, only a few models had noticeably large optimism estimates, and most of those had thin training data (e.g., highschool or college subsets).
The code needs better documentation, especially regarding parameters and returned values.
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-2 kableExtra_1.1.0 stargazer_5.2.2
## [4] knitr_1.20 plotly_4.8.0 boot_1.3-20
## [7] snow_0.4-3 cowplot_0.9.3 ggplot2_3.1.0
## [10] data.table_1.11.8 RevoUtils_11.0.1 RevoUtilsMath_11.0.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_0.2.5 purrr_0.2.5 colorspace_1.3-2
## [4] htmltools_0.3.6 viridisLite_0.3.0 yaml_2.2.0
## [7] rlang_0.3.0.1 later_0.7.5 pillar_1.3.0
## [10] glue_1.3.0 withr_2.1.2 bindrcpp_0.2.2
## [13] bindr_0.1.1 plyr_1.8.4 stringr_1.3.1
## [16] munsell_0.5.0 gtable_0.2.0 rvest_0.3.2
## [19] htmlwidgets_1.3 evaluate_0.12 labeling_0.3
## [22] httpuv_1.4.5 crosstalk_1.0.0 parallel_3.5.1
## [25] highr_0.7 Rcpp_1.0.0 xtable_1.8-3
## [28] readr_1.1.1 promises_1.0.1 scales_1.0.0
## [31] backports_1.1.2 webshot_0.5.0 jsonlite_1.5
## [34] mime_0.6 hms_0.4.2 digest_0.6.18
## [37] stringi_1.2.4 shiny_1.2.0 dplyr_0.7.8
## [40] grid_3.5.1 rprojroot_1.3-2 tools_3.5.1
## [43] magrittr_1.5 lazyeval_0.2.1 tibble_1.4.2
## [46] crayon_1.3.4 tidyr_0.8.2 pkgconfig_2.0.2
## [49] xml2_1.2.0 assertthat_0.2.0 rmarkdown_1.10
## [52] httr_1.3.1 rstudioapi_0.7 R6_2.2.2
## [55] compiler_3.5.1
Version 1.2.1335
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Matt Dowle and Arun Srinivasan (2018). data.table: Extension of data.frame. R package version 1.11.8. https://CRAN.R-project.org/package=data.table
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Luke Tierney, A. J. Rossini, Na Li and H. Sevcikova (2018). snow: Simple Network of Workstations. R package version 0.4-3. https://CRAN.R-project.org/package=snow
Claus O. Wilke (2018). cowplot: Streamlined Plot Theme and Plot Annotations for ‘ggplot2’. R package version 0.9.3. https://CRAN.R-project.org/package=cowplot
Yihui Xie (2018). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.20.
Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963
Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595
Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes. R package version 1.1-2. https://CRAN.R-project.org/package=RColorBrewer
Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.1. https://CRAN.R-project.org/package=stargazer
Hao Zhu (2018). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 0.9.0. https://CRAN.R-project.org/package=kableExtra
RStudio Team (2016). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
Herrnstein and Murray. (1994). The Bell Curve. New York: The Free Press. 1st paperback edition 1996. ISBN-13: 978-0-684-82429↩
Bureau of Labor Statistics, U.S. Department of Labor. National Longitudinal Survey of Youth 1979 cohort, 1979-2012 (rounds 1-25). Produced and distributed by the Center for Human Resource Research, The Ohio State University. Columbus, OH: 2014.↩
Prepared for the U.S. Department of Labor by Center for Human Resource Research, The Ohio State University. (2001). NLSY79 users’ guide : a guide to the 1979-2000 National Longitudinal Survey of Youth data. Columbus, Ohio :Center for Human Resource Research, Ohio State University.↩
Heckman, J. (1995). Lessons from The Bell Curve. Journal of Political Economy, 103(5), 1091–1120.↩
Krenz, C. (2007, August 8). Anatomy of an Analysis. Retrieved from http://www.claudiax.net/bell.html.↩
Shmueli, G. (2010). To Explain or To Predict? Statistical Science, 25(3), 289–310.↩
Miele, F. (1995). For Whom The Bell Curve Tolls. Skeptic, Volume 3, #3, 34 – 41.↩
Mukaka MM. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24(3):69-71.↩
Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE 12(6).↩
Efron B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78:316-331.↩
Harrell, F. E., Lee, K. L., & Mark, D. B. (1996). Tutorial in Biostatistics: Multivariable prognostic models. Statistics in Medicine, 15:361-387.↩
Herrnstein and Murray (1994). Nation.txt, Nation.hdr, and 2TBC_Documentation.ascii. Retrieved from http://www.rasmusen.org/xpacioli/bellcurve.↩
Bradley Efron (1987) Better Bootstrap Confidence Intervals, Journal of the American Statistical Association, 82:397, 171-185, DOI: 10.1080/01621459.1987.10478410.↩